192 research outputs found
Parallel SHVC decoder: Implementation and analysis
International audienceThe new Scalable High efficiency Video Coding (SHVC) standard is based on a multi-loop coding structure which requires the total decoding of all intermediate layers. The decoding complexity becomes then a real issue, especially for a real time decoding of ultra high video resolutions. A parallel processing architecture is proposed to reduce both the decoding time and the latency of the SHVC decoder. The proposed solution combines the high level parallel processing solutions defined in the HEVC standard with an extension of the frame-based parallelism. The latter solution enables the decoding of several spatial and temporal SHVC frames in parallel to enhance both decoding frame rate and latency. The wavefront parallel processing solution is used for more coarse level of granularity. The proposed hybrid parallel processing approach achieves a near optimal speedup and provides a good trade-off between decoding time, latency and memory usage. On a 6 cores Xeon processor, the parallel SHVC decoder performs a real time decoding of 1600p60 video resolution
ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression
Over the last few years, neural image compression has gained wide attention
from research and industry, yielding promising end-to-end deep neural codecs
outperforming their conventional counterparts in rate-distortion performance.
Despite significant advancement, current methods, including attention-based
transform coding, still need to be improved in reducing the coding rate while
preserving the reconstruction fidelity, especially in non-homogeneous textured
image areas. Those models also require more parameters and a higher decoding
time. To tackle the above challenges, we propose ConvNeXt-ChARM, an efficient
ConvNeXt-based transform coding framework, paired with a compute-efficient
channel-wise auto-regressive prior to capturing both global and local contexts
from the hyper and quantized latent representations. The proposed architecture
can be optimized end-to-end to fully exploit the context information and
extract compact latent representation while reconstructing higher-quality
images. Experimental results on four widely-used datasets showed that
ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions
estimated on average to 5.24% and 1.22% over the versatile video coding (VVC)
reference encoder (VTM-18.0) and the state-of-the-art learned image compression
method SwinT-ChARM, respectively. Moreover, we provide model scaling studies to
verify the computational efficiency of our approach and conduct several
objective and subjective analyses to bring to the fore the performance gap
between the next generation ConvNet, namely ConvNeXt, and Swin Transformer.Comment: arXiv admin note: substantial text overlap with arXiv:2307.02273.
text overlap with arXiv:2307.0609
Multi-core software architecture for the scalable HEVC decoder
International audienceThe scalable high efficiency video coding (SHVC) standard aims to provide features of temporal, spatial and quality scalability. In this paper we investigate a pipeline and parallel software architecture for the SHVC decoder. The proposed architecture is based on the OpenHEVC software which implements the high efficiency video coding (HEVC) decoder. The architecture of the SHVC decoder enables two levels of parallelism. The first level decodes the base layer and the enhancement layers in parallel. The second level of parallelism performs the decoding of both the base layer and enhancement layers in parallel through the HEVC high level parallel processing solutions, including tile and wavefront. Up to the best of our knowledge, it is the first real time and parallel software implementation of the SHVC decoder. On an Intel Xeon processor running at 3.2 GHz, the SHVC decoder reaches the decoding of 1600p enhancement layer at 40 fps for x1.5 spatial scalability with using six concurent threads
4K real time video streaming with SHVC decoder and GPAC player
International audienceThis paper presents the first 4Kp30 end-to-end video streaming demonstration based on the upcoming Scalable High efficiency Video Coding (SHVC) standard. The optimized and parallel SHVC decoder is used under the GPAC player to decode and display in real time the received SHVC layers. The SHVC reference software model (SHM) is used to encode the 4K original video in two spatial scalability layers: the base layer at 1080p resolution and the enhancement layer at 2160p resolution. The SHVC bitstream is encapsulated with the GPAC multimedia library into MP4 file format. The GPAC player at the server side broadcasts the MP4 content in MPEG-2 TS. At the client side, the GPAC player receives the SHVC video packets which are decoded by the SHVC decoder and then rendered in real time by the player. The GPAC player provides an interactive interface enabling to switch between displaying the base and the enhancement layers
Ensemble Learning for Efficient VVC Bitrate Ladder Prediction
Changing the encoding parameters, in particular the video resolution, is a
common practice before transcoding. To this end, streaming and broadcast
platforms benefit from so-called bitrate ladders to determine the optimal
resolution for given bitrates. However, the task of determining the bitrate
ladder can usually be challenging as, on one hand, so-called fit-for-all static
ladders would waste bandwidth, and on the other hand, fully specialized ladders
are often not affordable in terms of computational complexity. In this paper,
we propose an ML-based scheme for predicting the bitrate ladder based on the
content of the video. The baseline of our solution predicts the bitrate ladder
using two constituent methods, which require no encoding passes. To further
enhance the performance of the constituent methods, we integrate a conditional
ensemble method to aggregate their decisions, with a negligibly limited number
of encoding passes. The experiment, carried out on the optimized software
encoder implementation of the VVC standard, called VVenC, shows significant
performance improvement. When compared to static bitrate ladder, the proposed
method can offer about 13% bitrate reduction in terms of BD-BR with a
negligible additional computational overhead. Conversely, when compared to the
fully specialized bitrate ladder method, the proposed method can offer about
86% to 92% complexity reduction, at cost the of only 0.8% to 0.9% coding
efficiency drop in terms of BD-BR
LAR Image transmission over fading channels: a hierarchical protection solution
International audienceThe aim of this paper is to present an efficient scheme to transmit a compressed digital image over a non frequency selective Rayleigh fading channel. The proposed scheme is based on the Locally Adaptive Resolution (LAR) algorithm, and the Reed-Solomon error correcting code is used to protect the data against the channel errors. In order to optimize the protection rate and ensure better protection we introduce an Unequal Error Protection (UEP) strategy, where we take the hierarchy of the information into account. The digital communication system also includes appropriate interleaving and differential modulation. Simulation results clearly show that our scheme presents an efficient solution for image transmission over wireless channels, and provides a high quality of service, outperforming the JPWL scheme in high bit error rate conditions
Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark
HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for
over-the-top (OTT) video streaming services, due to its ability to deliver a
seamless streaming experience. A key component of HAS is the bitrate ladder,
which provides the encoding parameters (e.g., bitrate-resolution pairs) to
encode the source video. The representations in the bitrate ladder allow the
client's player to dynamically adjust the quality of the video stream based on
network conditions by selecting the most appropriate representation from the
bitrate ladder. The most straightforward and lowest complexity approach
involves using a fixed bitrate ladder for all videos, consisting of
pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely,
the most reliable technique relies on intensively encoding all resolutions over
a wide range of bitrates to build the convex hull, thereby optimizing the
bitrate ladder for each specific video. Several techniques have been proposed
to predict content-based ladders without performing a costly exhaustive search
encoding. This paper provides a comprehensive review of various methods,
including both conventional and learning-based approaches. Furthermore, we
conduct a benchmark study focusing exclusively on various learning-based
approaches for predicting content-optimized bitrate ladders across multiple
codec settings. The considered methods are evaluated on our proposed
large-scale dataset, which includes 300 UHD video shots encoded with software
and hardware encoders using three state-of-the-art encoders, including
AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis
provides baseline methods and insights, which will be valuable for future
research in the field of bitrate ladder prediction. The source code of the
proposed benchmark and the dataset will be made publicly available upon
acceptance of the paper
Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders
The next-generation Versatile Video Coding (VVC) standard introduces a new
Multi-Type Tree (MTT) block partitioning structure that supports Binary-Tree
(BT) and Ternary-Tree (TT) splits in both vertical and horizontal directions.
This new approach leads to five possible splits at each block depth and thereby
improves the coding efficiency of VVC over that of the preceding High
Efficiency Video Coding (HEVC) standard, which only supports Quad-Tree (QT)
partitioning with a single split per block depth. However, MTT also has brought
a considerable impact on encoder computational complexity. In this paper, a
two-stage learning-based technique is proposed to tackle the complexity
overhead of MTT in VVC intra encoders. In our scheme, the input block is first
processed by a Convolutional Neural Network (CNN) to predict its spatial
features through a vector of probabilities describing the partition at each 4x4
edge. Subsequently, a Decision Tree (DT) model leverages this vector of spatial
features to predict the most likely splits at each block. Finally, based on
this prediction, only the N most likely splits are processed by the
Rate-Distortion (RD) process of the encoder. In order to train our CNN and DT
models on a wide range of image contents, we also propose a public VVC frame
partitioning dataset based on existing image dataset encoded with the VVC
reference software encoder. Our proposal relying on the top-3 configuration
reaches 46.6% complexity reduction for a negligible bitrate increase of 0.86%.
A top-2 configuration enables a higher complexity reduction of 69.8% for 2.57%
bitrate loss. These results emphasis a better trade-off between VTM intra
coding efficiency and complexity reduction compared to the state-of-the-art
solutions
H2B2VS (HEVC Hybrid Broadcast Broadband Video Services) – building innovative solutions over hybrid networks
Broadcast and broadband networks continue to be separate worlds in the video consumption business. Some initiatives such as HbbTV have built a bridge between both worlds, but its application is almost limited to providing links over the broadcast channel to content providers’ applications such as Catch-up TV services. When it comes to reality, the user is using either one network or the other. H2B2VS is a Celtic-Plus project aiming at exploiting the potential of real hybrid networks by implementing efficient synchronization mechanisms and using new video coding standard such as High Efficiency Video Coding (HEVC). The goal is to develop successful hybrid network solutions that enable value added services with an optimum bandwidth usage in each network and with clear commercial applications. An example of the potential of this approach is the transmission of Ultra-HD TV by sending the main content over the broadcast channel and the required complementary information over the broadband network. This technology can also be used to improve the life of handicapped persons: Deaf people receive through the broadband network a sign language translation of a programme sent over the broadcast channel; the TV set then displays this translation in an inset window. One of the most important contributions of the project is developing and testing synchronization methods between two different networks that offer unequal qualities of service with significant differences in delay and jitter. In this paper, the main technological project contributions are described, including SHVC, the scalable extension of HEVC and a special focus on the synchronization solution adopted by MPEG and DVB. The paper also presents some of the implemented practical use cases, such as the sign language translation described above, and their performance results so as to evaluate the commercial application of this type of solution
- …